5 research outputs found

    Virtual sensors for erroneous data repair in manufacturing a machine learning pipeline

    Get PDF
    Manufacturing converts raw materials into finished products using machine tools for controlled material removal or deposition. It can be observed using sensors installed within and around machine tools. These sensors measure quantities, such as vibrations, cutting forces, temperature, currents, power consumption, and acoustic emission, to diagnose defects and enable zero-defect manufacturing as part of the Industry 4.0 vision. The continuity of high-quality sensor data streams is fundamental to predicting phenomena, such as geometric deformations, surface roughness, excessive coolant use, and imminent tool wear with adequate accuracy and appropriate timing. However, in practice, data acquired by some sensors can be of poor quality and unsuitable for prediction due to sensor faults stemming from environmental factors. In this paper, we answer if we can repair erroneous data in a faulty sensor based on data simultaneously available in redundant sensors that observe the same process. We present a machine learning pipeline to synthesize virtual sensors that can step in for faulty sensors to maintain reasonable quality and continuity in sensor data streams. We have validated the synthesized virtual sensors in four industrial case studies.publishedVersio

    A Systematic Review of Data Quality in CPS and IoT for Industry 4.0

    Get PDF
    The Internet of Things (IoT) and Cyber-Physical Systems (CPS) are the backbones of Industry 4.0, where data quality is crucial for decision support. Data quality in these systems can deteriorate due to sensor failures or uncertain operating environments. Our objective is to summarize and assess the research efforts that address data quality in data-centric CPS/IoT industrial applications. We systematically review the state-of-the-art data quality techniques for CPS and IoT in Industry 4.0 through a systematic literature review (SLR) study. We pose three research questions, define selection and exclusion criteria for primary studies, and extract and synthesize data from these studies to answer our research questions. Our most significant results are (i) the list of data quality issues, their sources, and application domains, (ii) the best practices and metrics for managing data quality, (iii) the software engineering solutions employed to manage data quality, and (iv) the state of the data quality techniques (data repair, cleaning, and monitoring) in the application domains. The results of our SLR can help researchers obtain an overview of existing data quality issues, techniques, metrics, and best practices. We suggest research directions that require attention from the research community for follow-up work.acceptedVersio

    On modeling green data center clusters

    No full text
    La consommation énergétique des clusters de centres de données augmente rapidement, ce qui en fait les consommateurs d'électricité à la croissance la plus rapide au monde. Les sources d’électricité renouvelables et en particulier l’énergie solaire en tant qu’énergie propre et abondante peuvent être utilisées pour couvrir leurs besoins en électricité et les rendre «verts», c’est-à-dire alimentés par le photovoltaïque. Ce potentiel peut être exploré en prévoyant l'irradiance solaire et en évaluant la capacité fournie pour les clusters de centres de données. Dans cette thèse, nous développons des modèles stochastiques pour l'énergie solaire; un à la surface de la Terre et un second qui modélise le courant de sortie photovoltaïque. Nous d'abord validons nos modèles par des données réels, puis nous proposons une étude comparative avec d’autres systèmes, notamment les modèles dits on-off. Nous concluons que notre modèle d'irradiance solaire peut capturer les corrélations multi-échelles de façon plus optimale, et il se montre particulièrement convénient dans le cas d’une production à petite échelle. De plus, nous proposons une nouvelle analyse de cycle de vie pour un système de cluster réel, ainsi qu'un modèle de cluster prenant en charge la soumission de travaux par lots et prenant en compte le comportement client impatient et persistant. Enfin, pour comprendre les caractéristiques essentielles du cluster d’ordinateurs, nous analysons deux cas: le complexe Google publié et le Nef cluster de l’Inria. Nous avons également implémenté marmoteCore-Q, un outil de simulation d’une famille de modèles de file d’attente, basé sur nos modèles.Data center clusters energy consumption is rapidly increasing making them the fastest-growing consumers of electricity worldwide. Renewable electricity sources and especially solar energy as a clean and abundant energy can be used, in many locations, to cover their electricity needs and make them "green" namely fed by photovoltaics. This potential can be explored by predicting solar irradiance and assessing the capacity provision for data center clusters. In this thesis we develop stochastic models for solar energy; one at the surface of the Earth and a second one which models the photovoltaic output current. We then compare them to the state of the art on-off model and validate them against real data. We conclude that the solar irradiance model can better capture the multiscales correlations and is suitable for small scale cases. We then propose a new job life-cycle of a complex and real cluster system and a model for data center clusters that supports batch job submissions and cons iders both impatient and persistent customer behavior. To understand the essential computer cluster characteristics, we analyze in detail two different workload type traces; the first one is the published complex Google trace and the second, simpler one, which serves scientific purposes, is from the Nef cluster located at the research center Inria Sophia Antipolis. We then implement the marmoteCore-Q, a tool for the simulation of a family of queueing models based on our multi-server model for data center clusters with abandonments and resubmissions

    Vers la modélisation de clusters de centres de données vertes

    No full text
    Data center clusters energy consumption is rapidly increasing making them the fastest-growing consumers of electricity worldwide. Renewable electricity sources and especially solar energy as a clean and abundant energy can be used, in many locations, to cover their electricity needs and make them "green" namely fed by photovoltaics. This potential can be explored by predicting solar irradiance and assessing the capacity provision for data center clusters. In this thesis we develop stochastic models for solar energy; one at the surface of the Earth and a second one which models the photovoltaic output current. We then compare them to the state of the art on-off model and validate them against real data. We conclude that the solar irradiance model can better capture the multiscales correlations and is suitable for small scale cases. We then propose a new job life-cycle of a complex and real cluster system and a model for data center clusters that supports batch job submissions and cons iders both impatient and persistent customer behavior. To understand the essential computer cluster characteristics, we analyze in detail two different workload type traces; the first one is the published complex Google trace and the second, simpler one, which serves scientific purposes, is from the Nef cluster located at the research center Inria Sophia Antipolis. We then implement the marmoteCore-Q, a tool for the simulation of a family of queueing models based on our multi-server model for data center clusters with abandonments and resubmissions.La consommation énergétique des clusters de centres de données augmente rapidement, ce qui en fait les consommateurs d'électricité à la croissance la plus rapide au monde. Les sources d’électricité renouvelables et en particulier l’énergie solaire en tant qu’énergie propre et abondante peuvent être utilisées pour couvrir leurs besoins en électricité et les rendre «verts», c’est-à-dire alimentés par le photovoltaïque. Ce potentiel peut être exploré en prévoyant l'irradiance solaire et en évaluant la capacité fournie pour les clusters de centres de données. Dans cette thèse, nous développons des modèles stochastiques pour l'énergie solaire; un à la surface de la Terre et un second qui modélise le courant de sortie photovoltaïque. Nous d'abord validons nos modèles par des données réels, puis nous proposons une étude comparative avec d’autres systèmes, notamment les modèles dits on-off. Nous concluons que notre modèle d'irradiance solaire peut capturer les corrélations multi-échelles de façon plus optimale, et il se montre particulièrement convénient dans le cas d’une production à petite échelle. De plus, nous proposons une nouvelle analyse de cycle de vie pour un système de cluster réel, ainsi qu'un modèle de cluster prenant en charge la soumission de travaux par lots et prenant en compte le comportement client impatient et persistant. Enfin, pour comprendre les caractéristiques essentielles du cluster d’ordinateurs, nous analysons deux cas: le complexe Google publié et le Nef cluster de l’Inria. Nous avons également implémenté marmoteCore-Q, un outil de simulation d’une famille de modèles de file d’attente, basé sur nos modèles

    A Conceptual Digital Twin Framework for City Logistics

    No full text
    International audienceUrban logistics is one of the key elements of urban mobility planning. The use of real-time information systems in logistics operations generates an enormous amount of data, nowadays used mainly for the purpose of monitoring and control of large flows of goods. At the same time, urban planners, business stakeholders, and city administrators are in need of adaptive, data-driven decision support solutions to address today's urban logistics problems. Recently, digital twins have received a lot of attention to support advanced experimentation, simulation and decision-making for on-demand logistics operations. Questions still remain on how to realize these for urban logistics management in a mixed public-private stakeholder context. We argue that this lack of a specific framework for city logistics with a model library for data mergers, linking physical and virtual data exchange, can compromise the timely adoption of digital twin technology. We contribute to filling this gap by presenting a systematic review of the literature, proposing a conceptual framework for digital twin applications in urban logistics, and providing use case scenarios for their demonstration. Together, these should advance the technical implementation of digital twins in a sustainable city logistics context
    corecore